首页> 外文OA文献 >Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks
【2h】

Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks

机译:关键词和关键词提取的中心性测度   搭配网络

摘要

Keyword and keyphrase extraction is an important problem in natural languageprocessing, with applications ranging from summarization to semantic search todocument clustering. Graph-based approaches to keyword and keyphrase extractionavoid the problem of acquiring a large in-domain training corpus by applyingvariants of PageRank algorithm on a network of words. Although graph-basedapproaches are knowledge-lean and easily adoptable in online systems, itremains largely open whether they can benefit from centrality measures otherthan PageRank. In this paper, we experiment with an array of centralitymeasures on word and noun phrase collocation networks, and analyze theirperformance on four benchmark datasets. Not only are there centrality measuresthat perform as well as or better than PageRank, but they are much simpler(e.g., degree, strength, and neighborhood size). Furthermore, centrality-basedmethods give results that are competitive with and, in some cases, better thantwo strong unsupervised baselines.
机译:关键字和关键词提取是自然语言处理中的一个重要问题,其应用范围从摘要到语义搜索再到文档聚类。基于图的关键字和关键词短语提取方法避免了通过在单词网络上应用PageRank算法的变体来获取大型域内训练语料库的问题。尽管基于图的方法是知识密集型的,并且易于在在线系统中采用,但是它们是否可以从PageRank之外的中心度度量中受益,仍然很大程度上是开放的。在本文中,我们对单词和名词短语搭配网络上的一系列中心性度量进行了实验,并在四个基准数据集上分析了它们的性能。不仅存在中心绩效指标的性能要好于或优于PageRank,而且它们也更简单(例如,程度,强度和邻里大小)。此外,基于中心性的方法所得出的结果与两个强大的无监督基准具有竞争力,并且在某些情况下要优于两个基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号